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(^ Abstract 

l> 

^^ Access to previous results is of paramount importance in the scientific process. Recent 

'^ progress in information management focuses on building e-infrastructures for the optimiza- 

QQ tion of the research workflow, through both policy-driven and user-pulled dynamics. For 

^) decades, High-Energy Physics (HEP) has pioneered innovative solutions in the field of infor- 

►>. mation management and dissemination. In light of a transforming information environment, 

it is important to assess the current usage of information resources by researchers and HEP 
J, provides a unique test-bed for this assessment. A survey of about 10% of practitioners in the 

C^ field reveals usage trends and information needs. Gommunity-based services, such as the pi- 

oneering arXiv and SPIRES systems, largely answer the need of the scientists, with a limited 
but increasing fraction of younger users relying on Google. Gommercial services offered by 
publishers or database vendors are essentially unused in the field. The survey offers an in- 
sight into the most important features that users require to optimize their research workflow. 
These results inform the future evolution of information management in HEP and, as these 
researchers are traditionally "early adopters" of innovation in scholarly communication, can 
inspire developments of disciplinary repositories serving other communities. 
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1 Introduction 

High- Energy Physics (HEP), also known as Particle Physics, has a long record of innovation 
in scholarly communication. Half a century ago, theoretical physicists and experimental col- 
laborations mailed to their peers hundreds, even thousands, of copies of their manuscripts. 
This occurred at the time of submission to peer-reviewed journals, whose speed in dissemi- 
nating scientific information was deemed to be insufficient for the speed at which the field was 
evolving [1] . This practice led to the creation of the first electronic catalog for gray literature, 
later evolving into a catalog of the entire subject literature: the SPIRES database [2]. 

In the last two decades, crucial innovation in scholarly communication emerged from the 
HEP community, ranging from the invention of the world-wide web at CERN [3] to the 
inception of arXiv, the first and archetypal repository [4]. The onset of the web gave SPIRES 
the honor to be the first web server in America and the first database on the web [5]. More 
recently the HEP community inspired the development of Invenio, one of the first open- source 
digital library software packages [6], currently used for repositories in many fields. 

Thanks to this suite of user-driven innovations, HEP scholars have used a variety of 
dedicated, field-specific "information resources". For many decades these have been run by 
large research institutions as a natural evolution of more conventional library services. At 
their inception, these resources often provided unique services, or were tailored specifically 
to the needs of the HEP community. Many of these services still exist and still provide 
information that cannot be obtained in any other way. 

For many years now almost all journal literature has been electronically available, the 
entire web is readily searchable, and commercial online databases provide metadata about 
all scientific literature. In addition, online services are changing more and more rapidly as 
new tools are developed and new ways of interacting with users evolve. In light of this 
fast-changing world, it is important to assess the usage by HEP researchers of HEP-specific 
information resources. Such a study serves two purposes: within the field, it informs on the 
need for such community-based resources and their real role in the present internet landscape, 
inspiring their future evolution; globally, it provides an in-depth case study of the impact of 
discipline-based information resources, as opposed to institution-based information resources 
or cross-cutting (commercial) information platforms. This information is particularly relevant 
in light of recent worldwide moves towards self-archiving of research results at the institutional 
or disciplinary level, and the need to effectively incorporate these resources in the research 
workflow. 

A survey of HEP scholars was designed and deployed in order to provide a unique in- 
sight into their information needs and the way their research workflow includes information 
discovery and retrieval. Its results are presented in this Article. The Article first describes 
the current landscape of HEP information resources (Section 2), then presents the survey 
methodology and the demographics of the respondents (Section 3). Two sets of results are 
presented and discussed: the information resources preferred by HEP researchers (Section 4) 
and their appreciation of the relative importance of possible features of information resources 
(Section 5). The survey also provides additional information on user requirements for the 
future of information resources (Section 6). After the conclusions of the study (Section 7), 
an Appendix presents some of the most inspiring free-text answers charting the future of 
information provision in this field. 



2 The Landscape of HEP Information Resources 

Several information resources serve the needs of HEP researchers, as summarized in the 
following. 

• arXiv [7]. arXiv is the archetypal repository. It was conceived in 1991 by Paul 
Ginsparg, then at the Los Alamos National Laboratory in New Mexico, and is now 
hosted at Cornell University in New York. It evolved a four-decade old tradition of 
HEP preprint circulation into an electronic based system, offering all scholars a level 
playing-field from which to access and disseminate information. Today arXiv counts 
nearly 500,000 preprints and has grown outside the field of HEP, becoming the refer- 
ence repository for many diverse disciplines beyond physics, from mathematics to some 
areas of biology. arXiv functions almost solely as a repository, with little emphasis on 
sophisticated searching or curation of bibliographic information. 

• CDS [8]. The CERN Document Server (CDS) was conceived in the late 90s at CERN, 
the European Organization for Nuclear Research in Geneva, Switzerland for the man- 
agement of scientific information at the laboratory [9]. It has a double role. On one 
side, it is CERN's institutional repository, with a mission to archive and disseminate 
CERN results in the fields of experimental and theoretical physics as well as accelera- 
tor and information technologies. On the other side, given the central role of CERN in 
research in the field, it expanded to also offer a gateway to HEP information at large, 
indexing the content of major journals and harvesting full-text from many preprint 
servers, with most of the content coming from arXiv. These efforts are more limited 
in scope and time than those at SPIRES, discussed below. CDS is based on the Inve- 
nio open-source digital library software [6]. CDS counts about 1 million records and 
500,000 full text-documents, as well as a growing multimedia collection. 

• SPIRES [10]. SPIRES has provided a metadata-only search engine for all literature 
in the field for over 30 years. It is hosted at SLAC, the Stanford Linear Acceler- 
ator Center in California, and jointly compiled together with DESY, the Deutsches 
Elektronen-Synchrotron in Hamburg, Germany, and Fermilab, the Fermi National Ac- 
celerator Laboratory in Illinois. SPIRES adds citation data, keywords, classifications 
and authors with their institutional affiliations to the basic data that is harvested from 
various sources of physics literature. Today SPIRES has grown to include 750,000 
records. SPIRES functions primarily as a gateway to all information in the field, 
providing context and consolidation for other data sources. SPIRES also provides a 
corrections and additions capability so that authors and users can correct errors they 
might find. In addition, other information of interest to the HEP community is offered 
at the SPIRES site via databases of jobs, conferences, people and institutions [11]. 

A fourth, community-based, information resource also serves the HEP community, even 
though it was originally designed for the astronomy and astrophysics communities. 

• ADS [12]. The Astrophysics Data System (ADS) is a digital library portal for re- 
searchers in astronomy and physics, operated by the Smithsonian Astrophysical Ob- 
servatory at Harvard under a NASA grant. It serves as a portal to astronomy and 



astrophysics data as well as bibliographic records. It offers highly customizable query 
forms and gives access to full-text scans of much of the astronomical literature which 
can be browsed or searched via a full-text search interface. 

The relations that exist between the different HEP information resources are relevant to 
understanding the degree of interoperability and complementarity between them. In addition 
they help shed light on both the workflow of HEP researchers and the findings of the survey. 
The main relations are as follows: 

• ADS & arXiv. ADS maintains a database of all arXiv content relevant to HEP, offering 
additional services such as highly customizable e-mail alerts or RSS feeds. 

• CDS & arXiv. CDS carries all arXiv content relevant to HEP, with targeted curation 
effort devoted to matching preprints with information on publication reference, confer- 
ence contributions, experimental collaborations, and the use of local authority files for 
author disambiguation. 

• SPIRES & arXiv. Because of their similar histories and mostly non-overlapping func- 
tions, SPIRES and arXiv could be considered as a single system. arXiv functions as 
the back-end data storage, as well as managing all of the complexities of submission. 
SPIRES provides a front-end interface, as well as giving further context to the arXiv 
submissions by matching them with published literature and adding citation, keywords 
and other data'^ Examples of their symbiosis include the fact that all of the arXiv 
content of HEP relevance is indexed in SPIRES and arXiv relies on SPIRES for tasks 
like citation analysis. 

Like virtually everyone else with internet access, HEP scholars also use Google [13] and 
Google Scholar [14] as information resources. One of the targets of this study is indeed to 
assess the penetration of these resources in the HEP scholarly-communication landscape. It 
is important to remark that arXiv and SPIRES have let their content be harvested by Google 
and then partly organized in Google Scholar. 

3 Survey Methodology and Demographics 

The data discussed in this Article were obtained by an anonymous on-line survey widely 
distributed in the HEP community. The survey ran for 6 weeks, from April 30, 2007 to June 
11, 2007, and collected 2,115 responses. The number of respondents can be compared with 
the number of HEP physicists active in 2006, which is about 20,000 [15], or the number of 
authors who have published an article listed in SPIRES in the last decade, which is between 
30,000 and 40,000, depending on how one handles similar names. It can be safely concluded 
that between 5% and 10% of the HEP community participated in the survey. This incredible 
rate of participation was further enhanced by the fact that 90% of the respondents wrote 
some optional free-text comments in addition to the required "radio- button" selections, and 
73% responded to optional lengthier questions. The engagement of the community is further 



^This is an oversimplified view, as arXiv.org does provide some searching capabilities and, since not all 
literature is submitted to arXiv nor is all arXiv content HEP related, the data sets in the two services are not 
identical. 



signified by the fact tliat about lialf of the respondents asked to be informed via e-mail of 
the results of the poll. 

This survey was promoted within the HEP community by e-mailing members of major 
experimental collaborations, users of major laboratories and authors of a major journal of 
the field: the Journal of High Energy Physics. A link to the survey was also distributed for a 
week as a heading in the popular daily e-mail alerts of arXiv. Prominent notices were posted 
on the CDS and SPIRES websites throughout the survey and for two weeks on the website 
of another major journal, Physical Review D. Information on the survey was also appended 
for two weeks to the correspondence between the editors and the authors of a third major 
journal, the European Physical Journal C. 

Table 1 presents the distribution of the respondents per country, which confirms the world- 
wide character of the HEP community and the worldwide spread of the survey. The fraction 
of answers per country mostly follows the distribution per country of the HEP authorship, as 
estimated in References 16 and 17, further confirming the absence of systematic biases in the 
response to the survey. The only appreciable trend is a reduced participation to the survey 
from Asia. Japan, China, Korea and Taiwan account for about 15% of HEP authors [17], 
but they only comprise 5% of the respondents to the survey. 

Table 2 presents the distribution of the respondents per field of activity, with theorists 
accounting for about 60% of the total respondents. Table 3 presents the experience of the 
respondents in the field, 76% have been active HEP scholars for 6 years or more. These data 
reflect well the demographics of the HEP community observed in the SPIRES HEPNames 
database [18]. The respondents to the survey are heavy users of HEP information resources: 
as presented in Table 4, 82% use such resources a few times a week or more. 

4 Preferred Systems 

The first question asked in the survey was: Which HEP information system do you use the 
most? The question, as all other questions discussed in this section, did not allow multiple 
choices, offering "radio buttons" for arXiv, CDS, Google, Google Scholar and SPIRES, as 
well as a free-text box for entering the name of another system. 1% of the respondents made 
use of this last possibility, mostly to refer to ADS, or to name two systems, typically arXiv 
and SPIRES, confirming the perception of these two as a single entity. 

The results are presented in Table 5 and Figure 1. Community-based systems, comprising 
ADS, arXiv, CDS, SPIRES and local library services, are the platform of choice for over 91% 
of the respondents. The combination of SPIRES and arXiv represents the vast majority of 
this fraction. 9% of the respondents use Google or Google Scholar, while commercial systems 
see a negligible use, around 0.1%. 

An interesting correlation with the seniority of HEP scholars is observed, whereby Google 
is the system of choice for 6% of scholars active in the field for 10 years or more, but for 
22% of scholars active in the field for 2 years or less. This trend is presented in Figure 2. 
It should be noted that the use of Google or Google Scholar benefits strongly from the fact 
that community-based systems have made their content available for harvesting. At the same 
time, Google and Google Scholar also act, as in many other fields, as a broader alternative to 
publisher portals, given that indexing of many publisher websites has taken place in recent 
years. 



Six further questions were asked to assess the use of different resources according to the 
tasks at hand: Which HEP information system do you use the most to find... 

preprints of which you know the reference? 
articles of which you know the reference? 
preprints on a given subject? 
articles on a given subject? 
preprints by a given author? 
articles by a given author? 

The answers are summarized in Table 5. These more specific questions reveal that the 
respondents change their behavior based on the task at hand, rather than operating only out 
of loyalty to a particular system. In particular their changes in usage seem to match some of 
the notable features of the available systems. As before a trend for a larger usage of Google 
by younger scholars is detected for these six questions, as summarized in Table 6. 

Figure 3 presents an aggregation of the use of community-based services, Google and 
commercial systems for the answers concerning searches for preprints and searches for articles. 
Again, community-based services dominate. 

As expected, the maximum usage of commercial services, and in particular publishers' 
websites, is observed in searches for articles whose reference is known. However, this number 
stays remarkably low, at 4.5%, confirming that community-based services are the preferred 
gateway to information, including the published literature, for HEP scholars. 

It is also interesting to remark that arXiv figures as the second favorite service in searches 
for articles. This is somewhat surprising as the site mission is the dissemination of preprints. 
However, HEP scholars routinely submit to arXiv an updated version of their preprints in 
an author-formatted post-peer-review version, and therefore make arXiv also a resource for 
published literature. Moreover, as the HEP content of arXiv is fully indexed by SPIRES, and 
for many users the distinction between the two is blurred, a user searching for a bibliographic 
reference in SPIRES, who clicks on the link to the arXiv version rather than the publisher 
version, would be inspired to answer that arXiv is her system of choice for such a search. 

In general, arXiv and SPIRES answer the needs of the vast majority of users, who do 
recognize the relative strengths and weaknesses of these two services as they move back and 
forth between them according to the task at hand. SPIRES is favored for journal literature, 
while arXiv increases its direct usage when preprints are desired. SPIRES is also more heavily 
favored when searching by author, possibly due to its more advanced author search and better 
author data. Subject searching, especially for published articles, sees a dramatic rise in the 
use of Google, possibly because a broader search may be desired, and the community-based 
systems do not have the breadth of coverage that Google has. There are several possible 
explanations for the lack of use of commercial systems: few institutions can afford access to 
them; if such access exists, most HEP scholars are not aware of such a possibility; even if 
they are, these systems do not provide detailed information specific to HEP users; even if 
such information is available, it is often lost within the "noise" of literature from many other 
fields. 

A final question on the preferred systems was: Which HEP information system do you 
use the most to find theses? The corresponding answers are presented in Table 5 and plotted 
in Figure 4. Unsurprisingly, Google has a larger share than for any other task, at about 1/3. 



However the efforts of community services to track and index theses is still reflected in 2/3 
of the users preferring these services for accessing theses. Commercial systems, again, have 
a negligible share, around 0.1%. 

5 Important Features 

In addition to inquiring about the most heavily used systems for different tasks, the survey 
aimed to assess the importance of various aspects of information resources. Respondents 
were asked to tag the importance of 12 features of an information system on a five-step scale, 
ranging from "not important" to "very important", these features are: 

Access to full text 
Citation analysis 
Collaborative tools 
Depth of coverage 
Keywords and classification 
Multimedia content 
Personalization 
Quality of content 
Search accuracy 
Speed to find what you want 
Submission interface 
User friendliness 

The results are presented in Table 7 and summarized in Figure 5. Notably, most features are 
felt to be important. 9 out of the 12 features were found to be important to over half of the 
respondents. Even multimedia content, the lowest rated feature, was found to be important 
by 20% of the users. 

Against this background of important features, access to full-text stood out clearly as the 
most valued feature, with only 5 respondents of the 1700 who answered this question rating 
it as not important. Following close behind full-text access are depth of coverage, quality of 
content and search accuracy. Citation analysis, a feature of many of the systems listed, was 
further down the list for most users. It was still considered important by most users, but it 
was clearly a secondary feature, along with user friendliness. 

Even if not all systems offer all these features, the perceived importance of each feature 
was found to be mostly independent from the system most used by the respondents. 

The survey included another set of questions, which were clearly labeled as optional, 
to further understand which additional features are considered important. Out of these 21 
additional features, 12 are particularly relevant and are discussed in the following: 

Access 

Finding theses 

Finding conference proceedings 

Finding articles cited with a given article 

Finding articles citing a given articles 

Finding top- cited articles by subject 



Community 

Finding conference announcements 
Annotating and commenting on documents 
Directory of authors and affiliations 
Retrieving list of publications 
Authorship 

Retrieving and exporting article references 
Possibility of submitting article revisions 
Knowing how often your articles are read 

The first five features concentrate on the access to information, the second four are part 
of a wider service to the community, while the last three are services tailored to authors. 
Respondents were asked to tag the importance of these features on a five-step scale, ranging 
from "not important" to "very important". The results are presented in Table 8 and Figure 6. 
Some of the services which are felt as moderately or very important by most respondents, such 
as the possibility of finding all articles citing a given article and the possibility of submitting 
a revised version of an article, are currently offered by SPIRES and arXiv, respectively. It 
is interesting to note that two of the other features which are perceived as moderately or 
very important by most respondents, the possibility of finding all articles cited with a given 
article and the possibility of knowing how often an article or preprint is downloaded, are not 
currently offered by the most widespread services. 

6 Winds of Change 

The survey explicitly inquired about the level of change that HEP scholars would expect 
and require from their information resources: 75% expected "some" to "a lot of" change in 
the next five years, while only 12% expected no changed' To structure this perception of 
change, respondents were asked to imagine their ideal information system in five years and 
tag the importance of 11 possible features on a five-step scale from "not important" to "very 
important" . These features are: 

Access from your PDA 

Access to data in figures and tables 

Authoring tools 

Centralization^ 

Collaborative tools 

Connections to fields outside HEP 

Inclusion of multiple types of documents 

Linked presentation of all instances of a result ^ 

Multimedia content 

Personalization 

Recommendation of documents of potential interest 



*90% of respondents answered this question. 

^The full-text of the "Centralization" feature was "Centralization: one single portal to all the information". 
®The full-text of the "Linked presentation" feature was "Linked presentation of all instances of a result, 
from notes to theses, from conference slides to articles" . 



The results are presented in Table 9 and Figure 7. While "modern" features such as mul- 
timedia content or access from a PDA were not considered overwhelmingly important, about 
90% of the users tagged three features as important: the linked presentation of all instances 
of a result, the centralization and the access to data in figures and tables. Immediately fol- 
lowing these three is the extension of the level of service of HEP information systems to other, 
related, disciplines. The last is hardly surprising: SPIRES has since long bridged the divide 
towards astrophysics, cosmology and nuclear physics, following an increased interdisciplinary 
activity of HEP scholars. 

A final question tried to assess the potential for the implementation of Web2.0 features to 
capture user-tagged content. Respondents were asked: // a simple web interface would show 
you an article and offer a set of categories to which it could belong, how much time would you 
spend in this tagging system to give a service to the community ? Of the 90% of respondents 
who answered this question, 19% would not spend any time on this system, while 63% would 
spend between five minutes a day and an hour a week. The breakdown of these answers by 
the seniority of the respondents is presented in Figure 8. There is an immense potential for 
user-generated, or rather user-tagged and user-curated, content in the field of information 
provision in HEP, as in many other fields of web-based communication. 

7 Conclusions 

The response to the survey was overwhelming, with over 2,000 HEP scholars, representing 
about 10% of the community, answering basic and long questions, sharing their appreciation 
and vision for information management in the field. The large participation is per se a result, 
signifying the engagement of the community with its information resources. 

The main finding of the survey is that community-based services are overwhelmingly 
dominant in the research workflow of HEP scholars. Although the popularity of Google 
increases with younger researchers, the field-specific utility provided by these highly-tailored 
services is perceived as more relevant. Commercial systems are virtually unused in the field. 

While the various community-based systems have stronger and weaker features, users 
attach paramount importance to three axes of excellence: access to full-text, depth of coverage 
and quality of content. 

Future evolution of these systems should be charted by the clear desire of users for a 
centralized and coherent presentation of all instances of a scientific result, with access to 
data in figures and tables and a connection to fields outside of HEP. The survey shows that 
there exists a remarkable potential for capturing user-tagged content, with a large fraction 
of users willing to invest time in such a community service. 

The survey collected thousands of free-text answers about the most and least liked features 
of current systems and the user requirements for future evolution of information provision in 
the field. While a detailed study of these additional data is underway, and outside the scope 
of this Article, some inspiring answers are distilled in the Appendix. 

The results discussed in this Article confirm the exceptional situation of the HEP com- 
munity in the field of scholarly communication: decades of efforts in developing, maintaining, 
populating and curating community-based services enable an efficient research workflow for 
HEP scientists and are met by overwhelming user loyalty. Scholarly communication is at the 
dawn of a new era, with the onset of institutional repositories and author self-archiving of 



research results. In this evolving landscape, could the decades-old success story of community- 
based HEP information systems, and their discipline-based content aggregation, provide in- 
spiration for scholarly communication in other fields? 

Acknowledgments 

First and foremost we wish to thank all HEP scholars who answered this survey, sharing their 
opinions, suggestions, wishes and constructive criticism on HEP information systems. We are 
grateful to our colleagues who shared their insight in the field of information management, 
which were crucial in the preparation of the survey: Catherine Cart, Jocelyne Jerdelet, Jean- 
Yves Le Meur, Tibor Simko, Tim Smith, and Jens Vigen at CERN; Zaven Akopov and 
Kirsten Sachs at DESY; and Pat Kreitz and Ann Redfield at SLAC. This study would not 
have reached such a large audience without the collaboration of Paul Ginsparg and Simeon 
Warner at arXiv, Enrico Balli at SISSA/Medialab, Bob Kelly and Erick Weinberg at APS 
and Christian Caron at Springer, who kindly disseminated information about the survey, and 
to whom we are indebted. 



10 



Appendix: Inspiring Free- Text Answers 

In addition to the results presented above, the survey cohected thousands of free-text answers, 
inquiring about features of current systems and their most-desired evolution. A detailed study 
of these comments is underway and outside the scope of this Article. However, it is particu- 
larly interesting to distill some of these answers here, in order to complete the assessment of 
the engagement of the HEP community with the systems which serve its information needs 
and its expectations for future developments. Some of the most inspiring free-text answers 
were along the following lines: 

• Desire for seamless open access to older articles, prior to the onset of arXiv in the '90s. 

• Improved full-text search and access to research notes of large experimental collabora- 
tions. These are a crucial gray-literature channel where large amounts of information 
and details about the results of large experiments transit. 

• Indexing of conference talks and long-term archiving of the corresponding slides, beyond 
the lifetime of conference websites. Interlinking of these slides with the corresponding 
conference proceedings, in preprint form with reference to published volumes, and pos- 
sibly other instances describing the results. 

• Use of the HEP information resources as fora for the publication of ancillary material, 
crucial in the research workflow, and in particular: 

— numerical data corresponding to tables; 

— numerical data corresponding to figures; 

— correlation matrices and additional information beyond these presented in tables, 
to allow an effective re-use of scientific results; 

— fragments of computer code accompanying complex equations in articles, to im- 
prove the research workflow and reduce the possibility of errors; 

— primary research data in the form of higher-level objects. 

• "Smarter" search tools, giving access to articles related to articles of interest. 

• Establishment of some new sort of open peer-review, overlaid on arXiv. 
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Country 

United States 

Germany 

Italy 

United Kingdom 

CERN 

France 

India 

Spain 

Canada 

Brazil 

Russia 

Switzerland 

China 

Japan 

Israel 

Netherlands 

Belgium 

Turkey 



Fraction 

27.4% 
9.5% 
7.7% 
6.5% 
4.9% 
4.1% 
3.4% 
3.0% 
2.6% 
2.4% 
2.4% 
2.2% 
2.1% 
1.7% 
1.5% 
1.2% 
1.1% 
0.9% 



Country 

Iran 

Mexico 

Australia 

Denmark 

Sweden 

Greece 

Portugal 

Argentina 

Korea 

Austria 

Poland 

Chile 

Finland 

Taiwan 

Czech Republic 

Norway 

Hungary 

Others 



Fraction 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0.7% 
0.7% 
0. 



O.f 
0.^ 
0.^ 
0.^ 
0.^ 
0.^ 
0.^ 
4.^ 



Table 1: Distribution of answers per country. Users based at CERN were asked to indicate 
"CERN" and not "Switzerland" . 97% of respondents answered this question. 



Field of activity 


Fraction 


Theory 


61.3% 


Experiment 


22.2% 


Software 


5.5% 


Instrumentation 


3.5% 


Accelerators 


2.7% 


Engineering 


1.3% 


Others 


3.5% 



Table 2: Field of activity of respondents to the survey, 
question. 



^ of respondents answered this 
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How long have you 




used HEP search engines? 


Fraction 


>10 years 


45.9% 


6-10 years 


30.0% 


3-5 years 


18.7% 


0-2 years 


5.4% 



Table 3: Experience of respondents. 95% of respondents answered this question. 



How frequently do you 




use HEP search engines? 


Fraction 


Every day 


57.0% 


A few times per week 


25.6% 


Once a week 


5.5% 


A few times per month 


7.4% 


Once a month 


2.0% 


A few times a year 


2.5% 



Table 4: Frequency of use of HEP search engines. 95% of respondents answered this question. 
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Fraction of users of 
Google and Google Scholar 


Which system do you use 
the most... 


> 10 years 


< 2 years 


in absolute 


6.0% 


22.1% 


for preprints (known reference)? 


3.2% 


15.4% 


for articles (known reference)? 


5.4% 


16.2% 


for preprints on a given subject? 


9.3% 


29.5% 


for articles on a given subject? 


17.5% 


34.4% 


for preprints by a given author? 


3.1% 


17.0% 


for articles by a given author? 


5.9% 


20.1% 


for theses? 


30.0% 


41.0% 



Table 6: Penetration of Google and Google Scholar as information resources in HEP as a 
function of the seniority of the scholars. 
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Which HEP information system do you use the most? 



arXiv 39.7% 



Community-based 
systems (91.4%) 




SPIRES 48.2% 



CDS 2.6% 
ADS 0.7% 
Library services 0.2% 

Google 7.8% 



Google (8.5%) 



Google scholar 0.7% 

Corrmerciai databases 
0.1% 



Commercial systems (0.1%) 



Figure 1: Favorite information resources for HEP scholars. The slice corresponding to com- 
mercial systems is enlarged for increased visibility. 
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Which HEP information system do you use the most? 



100% - 
90% - 
80% - 
70% - 
60% - 
50% - 
40% - 
30% - 
20% - 
10% - 
0% - 



■ Community-based systems 
Google 

■ Commercial systems 



>10 years 6-10 years 3-5 years 

of usage of HEP information systems 



0-2 years 



Figure 2: Categories of information resources used by HEP scholars as a function of their 
seniority in the field. 
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Which HEP information system do you use the most 
to find a preprint? 



Community-based 
systems 92.2% 




Commercial systems 
0.3% 



Which HEP information system do you use the most 
to find an article? 




Commerciaf systems 
2.7% 



Figure 3: Type of information resource most used by HEP scholars to access information in 
the form of preprints or pubhshed articles. 
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Which system do you use the most 
to find theses? 

Library service 
2.4% 

ADS 
"0.9% 



Community- 
based systems 
(67.3%) 



SPiRES 48.8% 




Google 27.2% 



Google (32.6%) 



Google Scholar 5.4% 



Commercial database 

0.1% 



Commercial 
systems (0.1%) 



Figure 4: Information resources most used by HEP scholars to search for theses. The shce 
corresponding to commercial systems is enlarged for increased visibility. 



23 



*- t' t: 

£= k o 

TO Q. Q- *- 

£ O t .E ID 

TO Q. „ >, -C 

o .i ^ ^ §. 

z w w 2 > 

n n n □ ■ 



(A 


c^- 


V 


3 


^ 


O 


3 


> 


(0 


^ 


V 


o 


M- 


•^ 


(A 


E 


£ 


tf) 


4-> 


>. 


V 


tf) 


^ 




n: 


c 


c 


.2 

4-> 


(0 


(0 


t 


E 


o 


^ 


a 


ii 


E 


c 




■^ 


5 

o 


c 

(0 




Ea 



»>. 



W' 

'/%" 



ooooooooooo 

OOTOOI^CDLO-nI-COCNIt- 






-r^. 



-0 



\ 



^C'. 






%. 



^n 



% 



'n. 



"% 



4h 






^-^ 



^o. 



% 



^. 






Q 



m 

O 
U 

o 

cc 

;-! 

Pi 
O 



a 

•I— I 

W 
o 

03 

P) 



IB 
O 
Pi 
03 



I 

(B 

CM 

LO 

(B 

bJO 



'/5^ 



24 









, ^ 








c 
CD 






c 


o 


O 


u 

Q. 

E 


c 
m 


TO 


O- 


, . 


■>, 


r 


t 


f- 


TO 


0) 


o 


u 




r 


C2 


E 


>> 




11 


E 


.^ 




F 


T3 


^ 


o 




o 


u 


11 


^ 


C/5 


w 


^ 


> 


D 


D 


□ 


H 


■ 



tf) 


^■ 


(U 


3 


k 


o 


3 

4-1 


> 


(0 


Im 


V 


P 


1^ 


(^ 


(A 


E 

4-* 


£ 


tf) 


4-> 


> 


(U 


(A 


to 


C 

o 


*J 




r 


*J 


(0 

r 


(0 

E 


o 

a 




E 


c 






% 


c 
re 


o 


(^ 


X 


o 




ooooooooooo 
oooor>-cDm^cocNt- 



71 




1/1 




8 


IB 


^ 


o 


ni 


+J 


ii 


;=i 


^^ 


03 


rt 


O 


O 


+j 


CU 


Ti 


+J 


n) 


03 




^1 


n 








o3 


fl 


r/1 


n 


0) 


o 


O 


m 


> 


tl) 


fn 


Ph 


CD 


:3 


[/J 


03 


CD 


^ 


03 


n) 


CD 


> 


CU 


q=l 


^ 


+j 




1/1 




^ 




q=l 


^ 


cu 




rill 


n) 


H 


rT^ 




4-i 


r/^ 


<v 


CP 




o 


^ 


r3 


ts 


o 


>. 


CI) 




(H 


fl 


fl 





o 


g 


03 


g 




o 


CJ 


H) 


fl 








(1h 


o 


W 




m 


o 


m 


> 


C) 




a; 


cu 


;-! 


fn 


1=1 


CU 


+J 


T-! 


m 




CD 


ts 


'ni 


03 


o 


O 


4^ 


+j 


Tl 


03 


Tl 


ft 


03 


CD 




fn 


O 


03 


Qi 


fn 


O 


:3 


^ 


n 


d 






-d 


o 


^ 


ft 


o 


a 


^ 




l/J 


T5 


CI) 


CU 


r^ 


> 


+J 






CI) 




CJ 


^ 


i-H 


n 


CD 




Ph 


oi 


«3 




CI) 


o 






3 


fl 


bjO 


■ ^ 



fa 



25 



I 

c 









, , 










TO 

r 






c 


Q 


o 




c 


o 


D- 

E 


a. 

E 


m 




F 


"cS 


0) 


I- 
o 


u 




! 


n 


E 


>> 


a) 


E 
11 


E 


,^ 




b 


T3 


^ 


o 




fi 


U 


11 


^ 


w 


Cfl 


^ 


> 


D 


D 


D 


u 


■ 



o 

)_ 
**- 

v> 

>• 

> 



E 

-4-1 

>• 

0) 

c 
o 



c 

(0 

o 
a 

E 



(0 
0) 



OS 

^- ._ 



•a 

(0 



c 
o 

o 
w _ 

■s.y 

O 

c 
5) 

(0 



J 


1 






- 


J 






- 


1 






- 


p^ 






- 


J 






- 




l^^i 






- 




_^H 






- 


1 






- 




F 4 








J 






- 






r 







o 
o 



o 



o 

00 



o 



o 

CO 



o 
in 



o 



o 

CO 



o o 

CM ■!- 



'"'3/. 



'■^Q^ 



Vq 



%n 



Uq, 



%, '-"o- 



%r. '"< 



'""-^^ \.. 



"%. 



""Oy: 



"^/ 



^''<E., 



'-^/^ 



%, 



'^ 



^«/. 



%n ""^ 



^<^A, V. %o. 



^'V,, '^^'^A 



■%/, 



^X. 



'M 



'%, 



^O/, 



% 



^/Q/. 



^ 



•^''.S'/, 



^o. 



"^.. 



.,, H. 'K ''^■^ 



"^^ 



'to/. 



^§.. ^%.. "/ 



^<v 



^■%, 



>.. -^^c,, °o 



^•SV/ 



X -^^-^ 



^z-^.. 



^^^ 



^o. 



% 



% 



%. 



^o. 



^% 



i-Cfe, 



■^■/c 



A 



'■^^/. 



CO 

o 

O 

CO 

;-! 

s 

a 

u 

•I 

H 






03 



IB 

o 

a 
o 

a 

•1— I 

I 
o 

<B 

t^ 

IB 



26 



100% 
90% 
80% 
70% 
60% 
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30% 
20% 
10% 
0% 




How much time would you spend in tagging 
articles through a web interface? 



- 30 minutes a week or more 
■None 



>10 years 6-10 years 3-5 years 

of usage of HEP information systems 



0-2 years 



Figure 8: Interest in participating in user-tagging of content as a function of the seniority of 
respondents. 
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